What can Visual Speech Synthesis tell Visual Speech Recognition?
نویسندگان
چکیده
We consider the problem of speech recognition given visual and auditory information, and discuss some of the ways that speech synthesis can provide assistance. Three possible contributions of synthetic visual speech are discussed: First, the use of synthetic speech to study human speech perception, second, the use of speech synthesis techniques to instantiate models of human speech production, and third, the use of these production models to help guide automatic speech recognition. Finally, we consider the reverse relationship: How can the techniques of automatic speech recognition assist in better visual
منابع مشابه
Inventory-Based Audio-Visual Speech Enhancement
In this paper we propose to combine audio-visual speech recognition with inventory-based speech synthesis for speech enhancement. Unlike traditional filtering-based speech enhancement, inventory-based speech synthesis avoids the usual trade-off between noise reduction and consequential speech distortion. For this purpose, the processed speech signal is composed from a given speech inventory whi...
متن کاملA hidden Markov model based visual speech synthesizer
This paper describes a hidden Markov model (HMM) based visual synthesizer designed to assist persons with impairedhearing. This synthesizer builds on results in the area of audio-visual speech recognition. We describe how a correlation HMM can be used to integrate independent acoustic and visual HMMs for speech-to-visual synthesis. Our results show that an HMM correlating model can signi cantly...
متن کاملA Weighted Discrete KNN Method for Mandarin Speech and Emotion Recognition
Speech signal is a rich source of information and convey more than spoken words, and can be divided into two main groups: linguistic and nonlinguistic. The linguistic aspects of speech include the properties of the speech signal and word sequence and deal with what is being said. The nonlinguistic properties of speech have more to do with talker attributes such as age, gender, dialect, and emot...
متن کاملSpeech Production in Noisy Environments and the Effect on Automatic Speech Recognition
Speech is bimodal in nature and includes the audio and visual modalities. In addition to acoustic speech perception, speech can be also perceived using visual information provided by the mouth/face (i.e., automatic lipreading). In this study, the visual speech production in noisy environments is investigated. The authors show that the Lombard effect plays an important role not only in audio spe...
متن کاملTitle Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models( Published Version ) Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models
The performance of automatic speech recognition (ASR) system can be significantly enhanced with additional information from visual speech elements such as the movement of lips, tongue, and teeth, especially under noisy environment. In this paper, a novel approach for recognition of visual speech elements is presented. The approach makes use of adaptive boosting (AdaBoost) and hidden Markov mode...
متن کامل